Semi-supervised Discovery of Informative Tweets During the Emerging Disasters

نویسندگان

  • Shanshan Zhang
  • Slobodan Vucetic
چکیده

The first objective towards the effective use of microblogging services such as Twitter for situational awareness during the emerging disasters is discovery of the disaster-related postings. Given the wide range of possible disasters, using a pre-selected set of disaster-related keywords for the discovery is suboptimal. An alternative that we focus on in this work is to train a classifier using a small set of labeled postings that are becoming available as a disaster is emerging. Our hypothesis is that utilizing large quantities of historical microblogs could improve the quality of classification, as compared to training a classifier only on the labeled data. We propose to use unlabeled microblogs to cluster words into a limited number of clusters and use the word clusters as features for classification. To evaluate the proposed semisupervised approach, we used Twitter data from 6 different disasters. Our results indicate that when the number of labeled tweets is 100 or less, the proposed approach is superior to the standard classification based on the bag or words feature representation. Our results also reveal that the choice of the unlabeled corpus, the choice of word clustering algorithm, and the choice of hyperparameters can have a significant impact on the classification accuracy. CCS Concepts •Information systems → Data analytics;

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weakly Supervised Classification of Tweets for Disaster Management

Social media has quickly established itself as an important means that people, NGOs and governments use to spread information during natural or man-made disasters, mass emergencies and crisis situations. Given this important role, real-time analysis of social media contents to locate, organize and use valuable information for disaster management is crucial. In this paper, we propose self-learni...

متن کامل

Mining User Intents in Twitter: A Semi-Supervised Approach to Inferring Intent Categories for Tweets

In this paper, we propose to study the problem of identifying and classifying tweets into intent categories. For example, a tweet “I wanna buy a new car” indicates the user’s intent for buying a car. Identifying such intent tweets will have great commercial value among others. In particular, it is important that we can distinguish different types of intent tweets. We propose to classify intent ...

متن کامل

Tweeting Behaviour during Train Disruptions within a City

In a smart city environment, citizens use social media for communicating and reporting events. Existing work has shown that social media tools, such as Twitter and Facebook, can be used as social sensors to monitor events in real-time as they happen (e.g. riots, natural disasters and sport events). In this paper, we study the reactions of citizens in social media towards train disruptions withi...

متن کامل

Large-Scale Inference of Network-Service Disruption upon Natural Disasters

Large-scale natural disasters cause external disturbances to networking infrastructure that lead to large-scale network-service disruption. To understand the impact of natural disasters to networks, it is important to localize and analyze network-service disruption after natural disasters occur. This work studies an inference of network-service disruption caused by the real natural disaster, Hu...

متن کامل

A comparison between semi-supervised and supervised text mining techniques on detecting irony in greek political tweets

The present work describes a classification schema for irony detection in Greek political tweets. Our hypothesis states that humorous political tweets could predict actual election results. The irony detection concept is based on subjective perceptions, so only relying on human-annotator driven labor might not be the best route. The proposed approach relies on limited labeled training data, thu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1610.03750  شماره 

صفحات  -

تاریخ انتشار 2016